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MULTIRESOLUTION  STOCHASTIC  MODELS, 
DATA  FUSION,  AND  WAVELET 
TRANSFORMS  ^ 

Kenneth  C.  Chou  2,  S.A.  Golden  ^  and  Alan  S.  Willsky^ 


Abstract 

In  this  paper  we  describe  and  analyze  a  class  of  multiscale  stochastic  pro¬ 
cesses  which  are  modeled  using  dynamic  representations  evolving  in  scale  based 
on  the  wavelet  transform.  The  statistical  structure  of  these  models  is  Marko¬ 
vian  in  scale,  and  in  addition  the  eigenstructure  of  these  models  is  given  by 
the  wavelet  transform.  The  implication  of  this  is  that  by  using  the  wavelet 
transform  we  can  convert  the  apparently  complicated  problem  of  fusing  noisy 
measurements  of  our  process  at  several  different  resolutions  into  a  set  of  decou¬ 
pled,  standard  recursive  estimation  problems  in  which  scale  plays  the  role  of 
the  time-like  variable.  In  addition  we  show  how  the  wavelet  transform,  which 
is  defined  for  signals  that  extend  from  —00  to  -|-oo,  can  be  adapted  to  yield  a 
modified  transform  matched  to  the  eigenstructure  of  our  multiscale  stochastic 
models  over  finite  intervals.  Finally,  we  illustrate  the  promise  of  this  method¬ 
ology  by  applying  it  to  estimation  problems,  involving  single  and  multi-scale 
data,  for  a  first-order  Gauss-Markov  process.  As  we  show,  while  this  process  is 
not  precisely  in  the  class  we  define,  it  can  be  well-approximated  by  our  models, 
leading  to  new,  highly  parallel,  and  scale-recursive  estimation  algorithms  for 
multi-scale  data  fusion.  In  addition  our  framework  extends  immediately  to  2D 
signals  where  the  computational  benefits  are  even  more  significant. 
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1  Introduction  and  Background 

Multiresolution  methods  in  signal  and  image  processing  have  experienced  a 
surge  of  activity  in  recent  years,  inspired  primarily  by  the  emerging  theory  of 
multiscale  representations  of  signals  and  wavelet  transforms  [1,  7,  8,  9,  12,  15, 
16,  31,  25].  One  of  the  lines  of  investigation  that  has  been  sparked  by  these 
developments  is  that  of  the  role  of  wavelets  and  multiresolution  representa¬ 
tions  in  statistical  signal  processing  [5,  6,  10,  11,  14,  26,  27,  28,  29,  30].  In 
some  of  this  work  (e.g.  [10,  11,  14,  24])  the  focus  is  on  showing  that  wavelet 
transforms  simplify  the  statistical  description  of  frequently  used  models  for 
stochastic  processes,  while  in  other  papers  (e.g.  [28,  29,  30,  5,  6,  26,  27])  the 
focus  is  on  using  wavelets  and  multiscale  signal  representations  to  construct 
new  types  of  stochastic  processes  which  not  only  can  be  used  to  model  rich 
classes  of  phenomena  but  also  lead  to  extremely  efficient  optimal  processing 
algorithms  using  the  processes’  natural  multiscale  structure.  The  contribu¬ 
tions  of  this  paper  lie  in  both  of  these  arenas,  as  we  both  construct  a  new  class 
of  multiscale  stochastic  models  (for  which  we  also  derive  new  and  efficient  al¬ 
gorithms)  and  demonstrate  that  these  algorithms  are  extremely  effective  for 
the  processing  of  signals  corresponding  to  more  traditional  statistical  models. 

In  [26,  27]  a  new  class  of  fractal,  1/f-like  stochastic  processes  is  con¬ 
structed  by  synthesizing  signals  using  wavelet  representations  with  coeffi¬ 
cients  that  are  uncorrelated  random  variables  with  variances  that  decrease 
geometrically  as  one  goes  from  coarse  to  fine  scales.  The  wavelet  transform, 
then,  whitens  such  signals,  leading  to  efficient  signal  processing  algorithms. 
The  model  class  we  describe  here  not  only  includes  these  processes  as  a  spe¬ 
cial  case  but  also  captures  a  variety  of  other  stochastic  phenomena  and  signal 
processing  problems  of  considerable  interest.  In  particular  by  taking  advan¬ 
tage  of  the  time-like  nature  of  scale,  we  construct  a  class  of  processes  that 
are  Markov  in  scale  rather  than  in  time.  The  fact  that  scale  is  time-like 
for  our  models  allows  us  to  draw  from  the  theories  of  dynamic  systems  and 
recursive  estimation  in  developing  efficient,  highly  parallelizable  algorithms 
for  performing  optimal  estimation.  For  our  models  we  develop  a  smoothing 
algorithm,  an  algorithm  which  computes  estimates  of  a  multiscale  process 
based  on  multiscale  data,  which  uses  the  wavelet  transform  to  transform  the 
overall  smoothing  problem  into  a  set  of  independently  computable,  small  ID 
standard  smoothing  problems. 

If  we  consider  smoothing  problems  for  the  case  in  which  we  have  measure- 
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ments  of  the  full  signal  at  the  finest  scale  alone,  this  algorithmic  structure 
reduces  to  a  modest  generalization  of  that  in  [27]-i.e.,  the  wavelet  transform 
whitens  the  measurements,  allowing  extremely  efficient  optimal  signal  pro¬ 
cessing.  What  makes  even  this  modest  contribution  of  some  significance  is 
the  richness  of  the  class  of  processes  to  which  it  can  be  applied,  a  fact  we 
demonstrate  in  this  paper.  Moreover  the  methodology  we  describe  directly 
yields  efficient  scale-recursive  algorithms  for  optimal  processing  and  fusion 
of  measurements  at  several  scales  with  only  minimal  increase  in  complexity 
as  compared  to  the  single  scale  case.  This  contribution  should  be  of  con¬ 
siderable  value  in  applications  such  as  remote  sensing,  medical  imaging,  and 
geophysical  exploration,  in  which  one  often  encounters  data  sets  of  different 
modalities  (e.g.  infrared  and  radar  data)  and  resolutions.  Furthermore,  al¬ 
though  we  focus  on  ID  signals  in  this  paper,  the  fact  that  scale  is  a  time-like 
variable  is  true  as  well  in  the  case  of  2D,  where  similar  types  of  models  lead 
to  efficient  recursive  and  iterative  algorithms;  the  computational  savings  in 
this  case  are  even  more  dramatic  than  in  the  case  of  ID. 

In  order  to  define  some  of  the  notation  we  need  and  to  motivate  the 
form  of  our  models,  let  us  briefly  recall  the  basic  ideas  concerning  wavelet 
transforms.  The  multiscale  representation  of  a  continuous  signal  f{x)  consists 
of  a  sequence  of  approximations  of  that  signal  at  finer  and  finer  scales  where 
the  approximations  of  f{x)  at  the  mth  scale  consists  of  a  weighted  sum  of 
shifted  and  compressed  (or  dilated)  versions  of  a  basic  scaling  function  ^{x): 

+  0O 

fm{x)=  ^  f{m,n)2^(f>{T^x  -  n)  (1) 

7l=  — CO 

For  the  (m  -f  l)st  approximation  to  be  a  refinement  of  the  mth,  we  require 
(j){x)  to  be  representable  at  the  next  scale: 

=  X]  V2h{n)<l)(2x  —  n)  (2) 

n 

As  shown  in  [8],  h(n)  must  satisfy  several  conditions  for  (1)  to  be  an  orthonor¬ 
mal  series  and  for  several  other  properties  of  the  representation  to  hold.  In 
particular  h{n)  must  be  the  impulse  response  of  a  quadrature  mirror  filter 
(QMF)  [8,  23],  where  the  condition  for  h(n)  to  be  a  QMF  is  as  follows. 

'£h{k)h{k-2n)  =  6n  (3) 

k 
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By  considering  the  incremental  detail  added  in  obtaining  the  (m  +  l)st 
scale  approximation  from  the  mth,  we  arrive  at  the  wavelet  transform  based 
on  a  single  function  '4>{x)  that  has  the  property  that  the  full  set  of  its  scaled 
translates  —  n)|  form  a  complete  orthonormal  basis  for  L"^.  In 

[8]  it  is  shown  that  (j)  and  ip  are  related  via  an  equation  of  the  form 


=  Yl^din)^{^x -n)  (4) 

n 

where  g{n)  and  h{n)  form  a  conjugate  mirror  filter  pair  [23],  and  that 

fm+iix)  =  fmix)  +  ^d(m,n)2”*''V(2”*a:  -  n)  (5) 

n 

Note  that  g{n)  and  h{n)  must  obey  the  following  algebraic  relationships. 

Y^g{k)h(k  -2n)  =  0  (6) 

k 

Y^g{k)g{k-2n)  =  (5„  (7) 

k 

h{n)h{n  -  2A;)  +  ^  g{n)g{n  -  2k)  =  (8) 

k  k 


If  we  have  the  coefficients  {/(m  +  1,  •)}  of  the  (m  +  l)st-scale  represen¬ 
tation  we  can  “peel  off”  the  wavelet  coefficients  at  this  scale  and  carry  the 
recursion  one  complete  step  by  calculating  the  coefficients  {/(m,  •)}  at  the 
next  coarser  scale.  The  resulting  wavelet  analysis  equations  are 

/(m,  n)  =  Y  -  k)f(m  +  1,  ^)  =  {Hmf{rn  -f  1,  •))„  (9) 

k 

d{m,n)  =  -  k)f{m  -f  1,  A:)  =  {Gmf{m  +  1,  •))„  (10) 

k 

where  the  operators  Hm  and  Gm  are  indexed  with  the  subscript  m  to  denote 
that  they  map  sequences  at  scale  m  -f-  1  to  sequences  at  scale  m^.  From 

^Note  that  for  the  case  of  infinite  sequences  the  operators  as  defined  here  are  precisely 
equivalent  for  each  scale;  i.e.  they  are  not  a  function  of  m.  However,  we  adhere  to  this 
notation  for  the  reasons  that  a)  we  may  allow  for  the  QMF  filter  to  differ  at  each  scale 
and  b)  for  the  case  of  finite-length  sequences  the  operators  are  in  fact  different  at  every 
scale  due  to  the  fact  that  the  the  number  of  points  differ  from  scale  to  scale. 
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(3,7,6)  we  have  the  following 


=  I 

(11) 

=  I 

(12) 

=  0 

(13) 

where  denotes  the  adjoint  of  the  operator. 

Reversing  this  process  we  obtain  the  synthesis  form  of  the  wavelet  trans¬ 
form  in  which  we  build  up  finer  and  finer  representations  via  a  coarse-to-fine 
scale  recursion: 

/(m  +  1,  n)  =  ^  h{2k  —  n)f{m,  ^  gi^k  —  n)d{m,  k)  (14) 

k  k 

Expressed  in  terms  of  the  operators  Hm  and  Gm  we  have 

/(m  -hi,  n)  =  •))n  +  •))„  (15) 

or 

=  I  (16) 

which  is  an  expression  of  eq.(8)  in  operator  form. 

Thus,  we  see  that  the  synthesis  form  of  the  wavelet  transform  defines 
a  dynamical  relationship  between  the  coefficients  f{m,  n)  at  one  scale  and 
those  at  the  next,  with  d(m,n)  acting  as  the  input.  Indeed  this  relationship 
defines  an  infinite  lattice  on  the  points  (m,n),  where  (m  +  1,  k)  is  connected 
to  (m,ra)  if  f(m,n)  influences  /(m  -f  1,^).  This  structure  is  illustrated  in 
Figure  1  for  the  case  where  h{n)  is  a  4-tap  filter,  where  each  level  of  the 
lattice  represents  an  approximation  of  our  signal  at  some  scale  m.  Note  that 
the  dynamics  in  (14)  are  now  with  respect  to  scale  rather  than  time,  and 
this  provides  us  with  a  natural  framework  for  the  construction  of  multiscale 
stochastic  models. 

In  particular  if  the  input  d(m,n)  is  taken  to  be  a  white  sequence,  then 
/(m,  n)  is  naturally  Markov  in  scale  (and,  in  fact,  is  a  Markov  random  field 
with  respect  to  the  neighborhood  structure  defined  by  the  lattice).  Indeed  the 
class  of  1/f-like  processes  considered  in  [26,  27]  is  exactly  of  this  form  with  the 
additional  specialization  that  the  variance  of  d{m,  n)  decreases  geometrically 
as  m  increases.  While  allowing  more  general  variance  structures  on  d(m,  n) 
expands  the  set  of  processes  we  can  construct  somewhat,  a  bit  more  thought 
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yields  a  substantially  greater  extension.  First  of  all,  with  wavelet  coefficients 
which  are  uncorrelated,  (14)  represents  a  first-order  recursion  in  scale  that 
is  driven  by  white  noise.  However,  as  we  know  from  time  series  analysis, 
white-noise-driven  first-order  systems  yield  a  comparatively  small  class  of 
processes  which  can  be  broadened  considerably  if  we  allow  higher-order  dy¬ 
namics,  which  can  either  be  captured  as  higher-order  difference  equations  in 
scale,  or,  as  we  do  here,  as  first-order  vector  difference  equations.  As  fur¬ 
ther  motivation  for  such  a  framework,  note  that  in  sensor  fusion  problems 
one  wishes  to  consider  collectively  an  entire  set  of  signals  or  images  from  a 
suite  of  sensors.  In  this  case  one  is  immediately  confronted  with  the  need  to 
use  higher-order  models  in  which  the  actual  observed  signals  may  represent 
samples  from  such  a  model  at  several  scales,  corresponding  to  the  differing 
resolutions  and  modalities  of  individual  sensors. 

Thus  the  perspective  we  adopt  here  is  to  view  multiscale  representations 
more  abstractly  than  in  the  wavelet  transform,  by  using  the  notion  of  a  state 
model  in  which  the  state  at  a  particular  node  in  our  lattice  captures  the 
features  of  signals  up  to  that  scale  that  are  relevant  for  the  “prediction”  of 
finer-scale  approximations.  As  we  will  see,  this  leads  us  to  a  model  class  that 
includes  the  wavelet  representation  of  (14)  as  a  special  case  and  that  leads 
to  extremely  efficient  processing  algorithms.  In  the  next  section  we  intro¬ 
duce  our  framework  for  state  space  models  evolving  in  scale,  and  we  show 
that  the  wavelet  transform  plays  a  central  role  in  the  analysis  of  the  eigen- 
structure  of  these  processes.  This  fact  is  then  used  in  Section  3  to  construct 
scale-recursive,  and  highly  parallel  algorithms  for  optimal  smoothing  for  such 
processes  given  data  at  possibly  a  number  of  different  scales.  In  Section  4 
we  then  investigate  an  important  issue  in  the  practical  application  of  these 
ideas,  namely  the  issue  of  applying  the  wavelet  transform  to  finite-length 
data.  The  typical  approach  is  to  base  the  transform  on  cyclic  convolutions 
rather  than  on  linear  convolutions  and  to  perform  the  scale  by  scale  recursion 
up  to  some  specified  coarse  scale.  We  present  a  more  general  perspective  on 
the  problem  of  adapting  the  wavelet  transform  to  finite-length  data  which  in¬ 
cludes  as  a  special  case  the  approach  using  cyclic  convolutions  as  well  as  other 
approaches  which  provide  modifications  of  the  wavelet  transform  to  provide 
Karhunen-Loeve  expansions  of  windowed  multiscale  processes.  In  Section  5 
we  illustrate  the  promise  of  our  multiscale  estimation  framework  by  applying 
it  to  smoothing  problems  for  Ist-order  Gauss-Markov  processes,  including 
problems  involving  the  estimation  of  such  processes  based  on  multiresolution 
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data.  Additional  experimental  results  for  other  processes,  including  1/f-like 
fractal  processes  can  be  found  in  [5]. 


2  Multiscale  Processes  on  Lattices 

In  this  section  we  define  our  class  of  multiscale  state  space  models  and  analyze 
their  eigenstructure.  We  develop  our  ideas  for  the  case  of  the  infinite  lattice, 
i.e.,  for  signals  and  wavelet  transforms  of  infinite  extent.  In  Section  4  we 
discuss  the  issue  of  adapting  the  wavelet  transform  and  our  results  to  the 
case  of  finite-length  data. 

Consider  an  infinite  lattice  corresponding  to  a  wavelet  whose  scaling  filter, 
h{n),  is  an  FIR  filter  of  length  P.  Recall  that  in  the  wavelet  transform  of  a 
signal  /  each  level  of  the  lattice  can  be  viewed  as  the  domain  of  an  P  sequence, 
namely  /(m,  •)  =  /(m).  In  our  generalization  of  the  dynamic  model  (15)  we 
think  of  each  level  of  the  lattice  as  corresponding  to  a  vector  P  sequence 
x{m)  =  where  x(m,n)  can  be  thought  of  as  the  vector  state  of  our 

multiscale  model  at  lattice  node  (m,  n)  and  x(m)  as  the  representation  at 
scale  m  of  the  phenomenon  under  study. 

To  motivate  our  general  model  let  us  first  consider  the  synthesis  equation 
(14)  driven  by  uncorrelated  wavelet  coefficients  d(m,n),  where  the  variances 
are  constant  along  scale  but  varying  from  scale  to  scale.  In  this  case  we 
obtain  the  following  stochastic  dynamic  state  model  where  we  define  the 
scale  index  m  from  an  initial  coarse  scale  jL  to  a  finest  scale,  M,  and  where 
we  assume  that  the  coarsest  scaling  coefficients  f{L,n)  are  uncorrelated. 
Thus,  with  x{m)  corresponding  to  /(m,  •)  and  w{m)  to  d(m,  •)  we  have  for 
m  =  L,L  1,...,M  —  1 


E[x{L)x{Lf] 

—  Al 

(17) 

II 

x{m  -f  1) 

=  +  G*^w{m) 

(18) 

E[w{i)w{i)'^] 

=  Ai 

(19) 

=  Xil  ,  i  =  L,L  +  -  1 

Note  that  if  we  let  A,-  =  cr^2~'^\  this  model  is  precisely  the  one  considered  in 
[26,  27]  for  modeling  a  l//-type  process  with  spectral  parameter  7. 
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The  model  class  we  consider  in  this  paper  is  a  natural  generalization 
of  (17)-(19).  In  specifying  our  model  we  abuse  notation  and  use  the  same 
notation  Hm,  Gm  for  the  coarsening  and  differencing  operators  defined  as  in 
(9,10)  where  f(m,n)  is  a  vector.  With  this  convention  our  model  takes  the 


following  form  for  m  =  L,  L  +  1,  ...M  —  1 

E[x(L)xiLf]  =  V^iL)  (20) 

x(m  +  1)  =  H^A{m l)x{m) B{rn -{■  l)w{rn -\- \)  (21) 

E[w{i)w{j)^]  =  Q{i)6i-j  ,  i  =  L  +  (22) 

where 

A{m)  =  diag{...,  A{rn),  ...A{rn), ...)  (23) 

B{m)  =  diag(...,  B{m),  ...)  (24) 

Q{m)  =  diag{...,Q{m),...Q{m),...)  (25) 

V^iL)  ^  diag{...,P,{L),...P,iL),...)  (26) 


and  where  A{m),  B{m),  Q(m),  and  Px{L)  a.Te  finite- dimensional  matrices  rep¬ 
resenting  the  system  matrix,  the  process  noise  matrix,  the  process  noise  co- 
variance  matrix,  and  the  initial  state  covariance  matrix,  respectively. 

If  we  let  x{m,n)  and  w{m,n)  denote  the  components  of  x{m)  and  w{m), 
respectively,  the  model  (21)  can  be  written  in  component  form  as 

a:(m  -f- 1,  n)  =  ^  h{2k  —  n)A{m)x{m,  k)  -f  B{m)w{m,  n)  (27) 

k 

where  the  tx;(m,  n)  are  white  with  covariance  Q{rn).  Comparing  (21),  (27) 
to  (14),  (18)  allows  us  to  make  several  observations.  In  particular,  (21)  is 
indeed  a  generalization  of  (18).  This  might  not  appear  to  be  obvious  since 
the  driving  noise  term  in  (21)  does  not  involve  the  differencing  operator 
However,  if  we  examine  (21)  and  define 

/^("^)  =  Gl,w{m)  (28) 

then,  thanks  to  (7)  or  (12)  and  the  fact  that  the  covariance  of  w(m,n)  varies 
with  m  but  not  n,  we  find  that  the  fi{m,n)  are  white  with  covariance  \m^, 
that  varies  with  m  but  not  n.  That  is,  (18)  is  exactly  of  the  form  of  (21), 
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with  x(m,  n),  w{m,n)  scalar  and  A{m)  =  B{m)  =  1.  By  augmenting  our 
model  class  to  include  finite-dimensional  state  vectors  defined  on  lattices,  we 
allow  for  the  possibility  of  higher-order  models.  This  allows  us  to  consider  a 
considerably  richer  class  of  processes  which  is  parametrized  by  the  matrices 
corresponding  to  our  state  model.  Note  also  that  this  model  bears  resem¬ 
blance  to  Laplacian  Pyramid  schemes[4]  where  the  added  detail  in  going  from 
one  scale  to  the  next  is  not  constrained  by  the  differencing  operator  Gm- 
As  mentioned  in  the  introduction,  the  model  (17)-(19)  yields  a  covari¬ 
ance  with  eigenstructure  given  by  the  wavelet  transform,  and  it  is  this  fact 
that  is  used  in  [26,  27]  to  develop  efficient  processing  algorithms  for  l//-like 
processes.  As  we  now  state,  the  same  is  true  for  our  more  general  model 
(21),  where  in  this  generalization  we  focus  on  what  can  be  thought  of  as 
the  “block”  eigenstructure  of  the  process  x{m).  That  is,  if  x{m,n)  is  d- 
dimensional,  the  discrete  wavelet  transform  of  the  signal  x{m,  •)  for  each  m 
yields  an  uncorrelated  set  of  random  vectors.  To  see  this,  let  us  examine  the 
covariance  R^xirn)  of  x{m),  where,  if  we  use  the  fact  that  the  block-diagonal 
operators  A{m),  B{m),  Q{rn),  VxiL)  and  their  adjoints  commute  with  the 
operators 

Rxx{tn)  =  E[x{m)x{m)'^]  (29) 

m—l  t=L 

=  (¥(m  -  1,  L)Vx{L)r{m  -  1,  L)){  U  H  Hi) 

izzL  m—l 

m—l  m—l  k 

+  x:  W)eWe-(t)F(m -!,*:))(  n  ffn(  II  Hi) 

k=L+\  i=k  i=m—l 

+  B{m)Q{m)B*{m) 

where  for  i  >  j 

Let  us  next  define  a  sequence  of  block  unit  vectors  as  follows. 

=  (31) 

ith 

where  the  superscript  j  is  used  to  denote  that  the  vector  (in  corresponds 

to  the  jth  scale  of  the  lattice  and  where  Id  is  the  d  x  d  identity  matrix  (and 
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Od  the  d  X  d  zero  matrix).  Note  that  in  the  present  context  the  superscript 
j  is  completely  superfluous  notation.  However,  in  Section  4  we  consider  the 
extension  of  the  ideas  presented  here  and  in  the  next  section  to  the  case  of 
finite  length  signals.  In  this  case  the  (finite)  lengths  of  the  vectors  x(m) 
vary  with  m,  as  will  the  dimensions  of  the  block  unit  vectors  analogous  to 
(31).  As  we  will  see,  with  changes  in  definitions  of  quantities  such  as  Sf,  the 
following  result  on  the  block-eigenstructure  of  i2ri(m),  as  well  as  the  results 
of  Section  3,  hold  in  essentially  unchanged  form  for  the  finite  length  case. 


Lemma  2.1  The  block  vectors  yf(m),  v!^{m)  for  I  =  —  1  and  for 

i,n  £  Z  are  block-eigenvectors  of  the  correlation  matrix  at  scale  m,  Rxx{rn), 
where 


and 

m— 1 

«f(m)  ^  (  n 

m— 1 

(32) 

The  following  holds: 

4(™)  =  (  n 

»=/+l 

(33) 

Rxx{m)v^{m) 

=  diag{...,  A£,(m),  ...Ai(m),  ...)i;f  (m) 

(34) 

Rxx['rn)v^^(m) 

=  diag{...,  A/(m),  ...A;(m),  ...)v^Jjn) 

(35) 

for  I  =  L,...m  —  1,  i,n  £  Z  where  AL(m),A/(m)  are  d  x  d  matrices  of  the 
form 


mk,L)B{k)Q{k)B^{k)^^{k,L))  ^m 

m 

A;(m)  =  mkJ)B{k)Q{k)B^{k)^^{kJ)) 

k-l+l 

where 

Hi,j)  =  {{,..^,.  ,  (38) 

^  (  A(^)$(^  -  l,j)  I  >  j  ^  ^ 

The  details  of  the  proof  of  this  result  can  be  found  in  [5].  Rather  than 
present  these  details  here,  we  provide  a  simple  interpretation  of  them  which 


(36) 

(37) 
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will  also  be  of  value  in  understanding  the  optimal  estimation  algorithms 
presented  in  the  next  section.  Specifically,  for  m  =  —  1  define  the 

following  transformed  quantities,  where  j  in  (39)  runs  from  I  through  m  —  1 
and  k  from  — oo  to  +  oo  in  (39),  (40)  : 

-jArn)  =  ivi{m)Axim)  (39) 

ulA'^)  =  (^fc("^))^3:(m)  (40) 

From  (15)  and  (32,  33)  we  see  that  what  we  have  done  is  to  take  the 
partial  discrete  wavelet  transform  decomposition  of  each  component  of  the 
d-dimensional  x(m,  n)  viewed,  for  m  fixed,  as  a  discrete-time  signal  with  time 
index  n.  That  is,  starting  with  x{m,  •)  we  have  first  peeled  off  the  finest  level 
of  the  wavelet  transform  Zm-lAA^  viewed  as  a  discrete  signal  with  index 
k,  and  then  have  computed  successively  coarser  wavelet  coefficients,  through 
zlA'^)i  producing  the  coarse  scaling  coefficients 

What  the  lemma  states  is  that  all  of  these  variables,  i.e.,  the  set  of  d- 
dimensional  vectors  2yfc(m)  and  ui^ki'nd)  for  all  values  oi  j  and  k  are  mutually 
uncorrelated.  Indeed  much  more  is  true,  in  that  these  variables  in  fact  evolve 
in  a  statistically  decoupled  manner  as  a  function  of  m.  Specifically,  if  we 
transform  both  sides  of  (21)  as  in  (39),  (40),  and  use  (11)-(13)  and  the 
commutativity  of  the  operators  in  (23)-(26)  with  and  we  obtain  the 
following  transformed  dynamics  for  m  =  —  1.  First,  the  coarsest 

signal  components  evolve  according  to 

ulA‘^  +  ^)  =  A{m  +  l}uL^kim)  +  B{m  +  +  1)  (41) 

where  r£,^yt(m  +  1)  is  the  coarse  wavelet  approximation  of  w(m  +  1,  •),  i.e.,  ^ 

+  1)  =  (ufc("z  d- l))^u;(m -f  1)  (42) 

and  where  the  initial  condition  for  (41)  is  simply  the  coarse  scale  signal  itself: 

ULAL)=x{L,k)  (43) 

^Note  that  we  are  again  abusing  notation  since  w(m,  n)  may  have  dimension  q  ^  d.  In 
this  case  the  only  needed  modifications  are  to  use  ^-dimensional  identity  and  zero  matrices 
in  (31)  and  similar  q-dimensional  versions  of  the  operators  Hk  and  Gk- 
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Next,  the  wavelet  coefficiets  at  different  scales  evolve  according  to 

Zj,kim  +  1)  =  A{m  +  +  B{m  +  +  1)  (44) 

for  j  =  L, m  —  1,  and  where 

+  1)  =  {vk{m  +  l))^u;(m  +  1)  (45) 

are  the  corresponding  wavelet  coefficients  of  w(m  -I- 1,  ■).  Finally,  as  we  move 
to  scale  m  +  1  from  scale  m  we  must  initialize  one  additional  finer  level  of 
wavelet  coefficients: 

Zm,k{^  +  ^)  =  B{m  +  l)s^^k{m  +  1)  (46) 

What  we  have  determined  via  this  transformation  is  a  set  of  decoupled 
ordinary  dynamic  systems  (41),  (44)  where  this  set  is  indexed  by  k  in  (41) 
and  by  j  =  L,  ...m  —  1  and  k  in  (44),  where  m  plays  the  role  of  the  “time” 
variable  in  each  of  these  models,  and  where  at  each  “time”  m  + 1  we  initialize 
a  new  set  of  models  as  in  (46).  More  importantly,  thanks  to  (11)-(13)  and 
(25),  (26)  the  initial  conditions  and  driving  noises,  x{L,k),  r£,,^;(m  +  1),  and 
■5j,fc(m  +  1)  are  mutually  uncorrelated  with  covariances  Px{L),  Q{m  +  1),  and 
Q{m  +  1),  respectively,  so  that  these  models  are  statistically  decoupled  as 
well.  From  this  fact  the  lemma  essentially  follows  immediately,  with  the 
identification  of  Ai:,(m)  as  the  covariance  of  uz,,fc(m)  (for  any  value  of  fc), 
which  evolves  according  to 

A/:,(m4-l)  =  A{m  +  l)XL{m)A^ {rn-\-l)-\- B{m  +  l)Q{m  +  l)B^{rn-\-l)  (47) 
with  initial  condition 

\l{L)  =  Px{L)  (48) 

Similarly,  Xi{m)  is  the  covariance  of  zi^ki'm)  which  also  evolves  according  to 
a  Lyapunov  equation  exactly  as  in  (47),  but  from  initial  condition  at  the 
(/  +  l)st  scale: 

X,{1  -\-l)  =  B{l  +  l)Q{l  +  l)B'^{l  +  1)  (49) 

3  Wavelet-Based  Multiscale  Optimal  Smooth¬ 
ing 

In  this  section  we  consider  the  problem  of  optimally  estimating  one  of  our 
processes  as  in  (21)  given  sensors  of  varying  SNR’s  and  differing  resolutions. 
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An  example  where  this  might  arise  is  in  the  case  of  fusing  data  form  sensors 
which  operate  in  different  spectral  bands.  We  formulate  this  sensor  fusion 
problem  as  an  optimal  smoothing  problem  in  which  the  optimally  smoothed 
estimate  is  formed  by  combining  noisy  measurements  of  our  lattice  process  at 
various  scales.  In  other  words  each  sensor  is  modeled  as  a  noisy  observation 
of  our  process  at  some  scale  of  the  lattice. 

Consider  the  following  multiscale  measurements  for  m  =  L,  X  +  1,  ...M. 

y{m)  =  C{m)x{m)  +  v{m)  (50) 

where 

C(m)  =  diag{...,C{m),...C{m),...)  (51) 

TZ{m)  =  diag{...,R{m),...R{m),...)  (52) 

E[v{i)v{j)'^]  =  (53) 

and  where  C{m)  is  a,  b  x  d  matrix  and  R{m)  is  a,  b  x  b  matrix  representing 
the  covariance  of  the  additive  measurement  noise..  Note  that  the  number  of 
sensors,  the  resolution  of  each  sensor,  and  the  precise  spectral  characteristics 
of  each  sensor  are  represented  in  the  matrix  C'(m).  For  example  if  there  were 
simply  one  sensor  at  the  finest  scale,  then  C{m)  =  0  for  all  m  except  m  =  M. 

We  define  the  smoothed  estimate,  denoted  as  x®(m),  to  be  the  expected 
value  of  x{m)  conditioned  on  y{i)  for  z  =  i,  L  +  1,  ...M;  i.e. 

x*(m)  =  E[x{m)\y{L),  ...y{M)]  (54) 

We  define  the  coarse-to-fine  filtered  estimate,  to  be  the  expected  value  of 
x(m)  conditioned  on  y{i)  for  i  =  L  +  I,  ...m;  i.e 

x(m|m)  =  E[x{m)\y{L),  ...y{m)]  (55) 

We  define  the  coarse-to-fine  one-step  predicted  estimate  to  be  the  ex¬ 
pected  value  of  x(m)  conditioned  on  y{i)  for  i  =  L,L  +  1,  ...m  —  1;  i.e 

x(m|m  —  1)  =  E[x{m)\y{L)^  ...y{m  —  1)]  (56) 

From  standard  Kalman  filtering  theory,  we  can  derive  a  recursive  filter 
with  its  associated  Riccati  equations,  where  the  recursion  in  the  case  of  our 
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lattice  models  is  in  the  scale  index  m.  We  choose  to  solve  the  smoothing 
problem  via  the  Rauch- Tung-Striebel(RTS)  algorithm[22].  This  gives  us  a 
correction  sweep  that  runs  recursively  from  fine  to  coarse  scales  with  the 
initial  condition  of  the  recursion  being  the  final  point  of  the  Kalman  filter. 
The  following  equations  describe  the  “down”  sweep,  i.e.  the  filtering  step 


from  coarse  to  fine  scales. 

For  m  —  L, 

^(m|m  —  1)  =  H^_^A{m)x{m  —  l\m  —  1)  (57) 

x{m\m)  =  :r(m|m  —  1) -f /C(m)[y(m)  —  C(m)£(m|m  —  1)]  (58) 

/C(m)  =  'P(m|m  —  l)C*(m)<S(m)  (59) 

S{m)  =  {C{m)V{m\m— +  Tt{rn))~^  (60) 

'P(m|m  — 1)  =  H^_-^^A{rn)V{Tn  —  l\rn  —  l)A*{rn)Hrn-\ 

-f-  B{m)Q{m)B*{rn)  (61) 

7^~^(m|m)  =  —  \)  ^  C*{m)Tt~^{m)C{rn)  (62) 

with  initial  conditions 

x{L\L-l)  =  0  (63) 

V{L\L-l)  =  V.{L)  (64) 


We  also  have  the  following  equations  for  the  correction  sweep  of  the  Rauch- 
Tung-Striebel  algorithm,  i.e.  the  “up”  sweep  from  fine  to  coarse  scales. 

For  m  =  M  —  1,  M  —  2,  ...L  -t- 1,  L: 

x^(m)  =  x{m\m) +  V{m\m)A*{m l\rn)[x^{rn —  x{m l\rn)] 


(65) 

V^{m)  —  V{m\m) —  V{m -{■  l\rn)\E*{rn)  (66) 

E{m)  =  'P(m|m).4*(m l)JJ„i'P“^(m -f  l|m)  (67) 

with  initial  conditions 

x^{M)  =  x{M\M)  (68) 

V\M)  =  V{M\M)  (69) 


Note  that  we  could  equally  have  chosen  to  start  the  RTS  algorithm  going  from 
fine  to  coarse  scales  followed  by  a  correction  sweep  from  coarse  to  fine,  i.e. 


13 


an  up-down  rather  than  the  down-up  algorithm  just  described.  This  involves 
defining  the  filtered  and  one-step  predicted  estimates  in  the  direction  from 
fine  to  coarse  rather  than  coarse  to  fine.  Similarly  we  could  also  construct  a 
so-called  two-filter  algorithm  [22]  consisting  of  parallel  upward  and  downward 
filtering  step.  Details  on  these  variations  are  found  in  [5]. 

The  smoothing  algorithm  described  to  this  point  involves  a  single  smoother 
for  the  entire  state  sequence  x(m)  at  each  scale.  However,  if  we  take  advan¬ 
tage  of  the  eigenstructure  of  our  process  and,  more  specifically,  the  decoupled 
dynamics  developed  at  the  end  of  the  preceding  section,  we  can  transform 
our  smoothing  problem  into  a  set  of  decoupled  ID  RTS  smoothing  prob¬ 
lems  which  can  be  computed  in  parallel.  Specifically,  define  the  following 
transformed  quantities. 


%fc(m|m  -  1) 

A 

x{m\m  —  1) 

(70) 

Pj^k{m\m  —  1) 

A 

{vi{m)AV{m\m  -  l)u^(m) 

(71) 

Zj,k{m\m) 

A 

(72) 

Pj,k{m\m) 

A 

{vl{m)AV{m\m)vi{m) 

(73) 

UL,k{^\'^  -  1) 

A 

1) 

(74) 

PlA'^I'^  “  1) 

A 

(u^(7n))^P(m|m  -  l)ufc(m) 

(75) 

UL,k{rn\m) 

A 

(t;f'(m))^x(m|m) 

(76) 

A:,fc(»7^|m) 

A 

{vArn)fV{m\m)vAm) 

(77) 

Aki^) 

A 

(u^(m))V(m) 

(78) 

AW 

A 

(79) 

A 

{vAm)fx\m) 

(80) 

PL.kM 

A 

{vAm)fV^{m)vAm) 

(81) 

These  quantities  represent  the  transformed  versions  of  the  predicted,  filtered, 
and  smoothed  estimates  in  the  Rauch-Tung-Striebel  algorithm,  along  with 
their  respective  error  covariances,  in  the  transform  domain.  We  also  need 
to  represent  the  transformed  data,  where  the  data  at  each  scale,  y{m)  has 
components  which  are  finite-dimensional  vectors  of  dimension  6x1.  We 
represent  these  vectors  using  eigenvectors  as  in  (32),  (33),  where  in  this  case 
the  blocks  in  (31)  are  6x6: 
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yi,k{'<^)  =  y{m)  (82) 

yL,k{m)  =  {v^{m)fy{m)  (83) 

As  before  for  each  scale  m,  where  m  =  L  +  1,  ...M,  the  index  ranges  are  j  = 
L, m  —  1  and  — oo  <  k  <  oo.  That  is,  for  each  m  other  than  at  the  coarsest 
scale,  L,  we  transform  our  quantities  so  that  they  involve  eigenvectors  whose 
coarsest  scale  is  L. 

We  now  can  state  the  following,  which  follows  directly  from  the  analysis 
in  the  preceding  section  : 

Algorithm  3.1  Consider  the  smoothing  problem  for  a  lattice  defined  over  a 
finite  number  of  scales,  labeled  from  coarse  to  fine  as  m  =  L,L  +  1,  ...M .  The 
following  set  of  equations  describes  the  solution  to  the  smoothing  problem, 
transformed  onto  the  space  spanned  by  the  eigenvectors  of  Rxx{m),  in  terms 
of  independent  standard  Rauch-Tung  Striebel  smoothing  algorithms. 

DOWN  SWEEP: 

For  j  ■=  L,L  -\r  1,  ...M  —  2  and  k  G  Z: 

ij,fc(m|m  —  1)  =  A{m)zj^k{'ni  -  l\m  -  1)  (84) 

—  1)  =  A{m)Pj^k{^  —  l|w  —  l)A^(m)  +  B{m)Q{m)B^{m) 

(85) 

m  =  j  +  2,j  +  3,...M 

with  the  initial  conditions  for  j  =  L,  L  1,  ...M  —  1  and  k  £  Z 

4fc(i  +  lii)  =  0  (86) 

Pj,k{j  +  Mj)  =  B{j  +  l)Qij  +  l)B^{j  +  l)  (87) 


For  j 

=  L,LA  -  1 

and  k  £  Z: 

^hk{m\ 

m)  =  Zj^k{rn\m  — 

1)  +  ^^j,ki^){yj,ki^)  -  C'(m)4fe(m|m  - 

-1)) 

(88) 

1)  +  C'^(m 

)R  ^{m)C{m) 

(89) 

m  =  i  +  l,j+2. 

...M 

KU 

m)Vi{m) 

(90) 
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For  k  ^  Z  we  have 


UL,k{,'m\m  —  1)  =  A{m)uL^k{m  —  l\m  —  1)  (91) 

PL,k{m\m  —  l)  =  A{rri)pL^k{rn  —  l\rn —  l)A?^{m)  +  B{rn)Q{rn)B'^{rn) 

(92) 

m  —  Z(  +  1,  iy  -|-  2,  ...M 

with  the  initial  conditions 


UL,k{L\L-l)  =  0  (93) 

pL,k{L\L-l)  =  P,{L)  (94) 

For  k  £  Z  we  have 

UL,k{m\m)  =  UL^k{rn\rri- 1)  +  KL^k{rn){yL,k{rn)  -  C{rri)uL,kirn\rn  -  1)) 

=  PLl{m\m-l)  +  C'^{m)R-\m)C{m)  (96) 

m  =  L,L  +  l,...M 

I<lA^)  =  (^i("*))^^(”^)Vfc(m)  (97) 

UP  SWEEP: 

For  j  =  L,  L  +  1,  ...M  —  1  and  k  £  Z 


Ak(^)  =  kk{m\m)  +  P,-,*(m|m)y4^(m  +  l)Pj(m  +  l\m)[zlk{m  +  1)  -  ^^(m  +  l|m)] 


Pj,fc(m|m)  +  EjA'i'n)  [Pj,k{‘^  +  1)  “  PjA^  + 

(98) 

P!M)  = 

(99) 

p3,kpn)  = 

Pj,;i;(mim)y4^(m  +  l)P~^(m  +  l|m) 

(100) 

m  = 

(101) 

with  initial  condition 

4k(M)  = 

(102) 

PiAM)  = 

(103) 
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desired  at  some  scale  L  <  m  <<  M,  we  use  the  wavelet  synthesis  equation 
(15)  to  construct  it  from  its  coarse  scale  approximation  and  its  finer 

scale  wavelet  coefficients  z^^j.  for  scales  j  =  —  1.  Thus,  the  overall 

procedure  is  of  complexity  0{lN). 

Let  us  make  several  closing  comments.  First,  as  the  preceding  complexity 
analysis  implies,  the  algorithm  we  have  described  can  be  adapted  to  finite 
length  data,  with  appropriate  changes  in  the  eigenvectors/ wavelet  transform 
to  account  for  edge  effects.  This  extension  is  discussed  in  the  next  sec¬ 
tion.  Secondly,  note  that  if  only  finest-scale  data  are  available  (i.e.  only 
C{M)  ^  0),  our  smoothers  degenerate  to  coefficient-by-coefficient  static  es¬ 
timators  (i.e.  each  wavelet  coefficient  in  (82)  ,  (83),  at  scale  m  =  M  is 
used  separately  to  estimate  the  corresponding  component  of  x{M)),  which 
is  an  algorithm  of  exactly  the  same  structure  as  that  in  [27]  for  the  particu¬ 
lar  choice  of  parameters  in  the  scalar  version  of  our  model  corresponding  to 
1  / /-like  processing. 

Finally,  it  is  important  to  note  that  the  transform  method  of  parallelizing 
the  smoothing  problem,  used  here  and  in  [27],  requires  the  matrix  C{m) 
in  (50)  to  have  constants  along  the  diagonal  for  all  m,  i.e.  that  the  same 
measurements  are  made  at  all  points  at  any  particular  scale.  The  case  of 
missing  data  at  a  given  scale  is  an  example  in  which  this  structure  is  violated. 
This  is  relevant  to  situations  in  which  one  might  want  to  use  coarse  data  to 
interpolate  sparsely  distributed  fine  data.  This  problem  can  be  handled  via 
an  alternate  set  of  efficient  algorithms  using  models  based  on  homogeneous 
trees.  We  refer  the  reader  to  [5,  30]  for  details. 


4  Finite  Length  Wavelet  Transforms 

In  this  section  we  discuss  the  problem  of  adapting  the  wavelet  transform, 
thus  far  defined  only  for  infinite  sequences,  to  the  case  of  finite-length  se¬ 
quences,  i.e.  producing  a  transform  that  maps  finite-length  sequences  into 
finite-length  sequences.  This  is  a  topic  of  considerable  current  interest  in 
the  wavelets  literature,  [31],  as  the  effects  of  windowing  in  wavelet  trans¬ 
forms  are  not  as  well-understood  as  those  for  Fourier  analysis.  To  begin, 
note  that  both  the  analysis  and  synthesis  equations,  (9,10,14),  for  comput¬ 
ing  the  wavelet  and  scaling  coefficients  are  defined  as  operations  on  infinite 
length  sequences.  Adapting  these  equations  to  the  case  of  finite  length  se- 
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For  k  £  Z 


PU^) 


Uj^k(m\m)  +  Pj,k{m\m)A'^{m  +  l)PjJ{m  +  l\m)[z^-j^{m  +  1)  -  Uj,k{'<^  +  1|^)] 


Pj-fc(m|m)  +  E'^Am)  [P/fc(m  +  1)  -  Pj,k{m  +  l|m)]  {m) 

Pj^k{m\m)A^{m  +  l)P~A'm  +  l|m) 

M  -  l,M  -2,...LfI,L 


(104) 

(105) 

(106) 
(107) 


with  initial  condition 


SIAM)  =  ulAM\M)  (108) 

PLkm  =  PlAM\m)  (109) 


Note  that  our  algorithm  is  highly  efficient  in  that  we  have  transformed  the 
problem  of  smoothing  what  are,  in  principle,  infinite  dimensional  or,  in  the 
case  of  windowed  data,  very  high-dimensional  vectors,  to  one  of  smoothing 
in  parallel  a  set  of  finite  dimensional  vectors.  Also,  the  smoothing  procedure 
takes  place  in  scale  rather  than  in  time,  and  for  finite  data  of  length  N 
this  interval  is  at  most  of  order  logN,  since  each  successively  coarser  scale 
involves  a  decimation  by  a  factor  of  2.  Note  also  that  as  we  move  to  finer 
scales  we  pick  up  additional  levels  of  detail  corresponding  to  the  new  scale 
components  (46)  introduced  at  each  scale.  This  implies  that  the  smoothers 
in  our  algorithm  smooth  data  over  scale  intervals  of  differing  lengths:  of 
length  roughly  logN  for  the  coarsest  components  (since  data  at  all  scales 
provide  useful  information  about  coarse  scale  features)  to  shorter  length  scale 
intervals  for  the  finer  scale  detail  (since  data  at  any  scale  is  of  use  only  for 
estimating  detail  at  that  scale  or  at  coarser  scales,  but  not  at  finer  scales). 
This  structure  is  illustrated  in  Figure  2. 

Let  us  next  analyze  the  complexity  of  our  overall  algorithm  for  smoothing 
our  lattice  processes.  We  first  transform  our  data  using  the  wavelet  trans¬ 
form  which  is  fast:  0{lN)  where  N  is  the  number  of  points  at  the  finest 
scale  and  /  is  the  length  of  the  QMF  filter.  We  then  perform  in  parallel 
our  ID  smoothers.  Even  if  these  smoothers  are  computed  serially  the  total 
computation  is  0{lN).  After  performing  the  parallel  ID  smoothers  on  these 
transformed  variables  an  additional  inverse  transformation  is  required,  which 
is  performed  again  using  the  inverse  wavelet  transform.  That  is  if  x^{m)  is 


17 


quences  while  preserving  both  the  orthogonality  and  the  invertibility  of  the 
transformation  proves  to  be  non-trivial  for  the  following  reason.  Take  a  10- 
point  sequence,  a:(n),  and  consider  performing  its  wavelet  transform  using  a 
QMF  filter,  h{n),  of  length  four.  To  compute  the  scaling  coefficients  at  the 
next  coarsest  scale  we  apply  (9)  to  x(n),  resulting  in  a  scaling  coefficient  se¬ 
quence,  c(n),  which  is  of  length  6  (the  linear  convolution  results  in  a  13-point 
sequence,  while  the  downsampling  by  a  factor  of  two  reduces  this  to  a  6-point 
sequence).  Similarly,  by  applying  (10)  to  x{n)  we  get  a  wavelet  coefficient 
sequence,  d(n),  which  is  also  of  length  6.  Thus,  the  overall  transformation 
from  the  nonzero  portion  of  {x(n)}  to  the  nonzero  portions  of  {c(n),d(n)} 
in  this  case  is  a  map  from  71^^  to  71^^,  which  makes  it  impossible  for  it  to 
be  invertible.  This  example  is  illustrated  Figure  3,  where  x{n)  is  defined  as 
indicated  on  the  first  level  of  a  truncated  lattice  and  {c(n),d{n)}  are  mapped 
into  the  second  level  where  the  lattice  branches  are  illustrated  for  the  case 
where  the  operators  Hi,Gi  correspond  to  a  QMF  filter,  h{n),  of  length  four 
and  only  branches  connecting  to  points  in  the  nonzero  portion  of  x(n)  are 
shown. 

Thus,  we  can  already  see  the  fundamental  problem  in  trying  to  develop 
an  orthonormal  matrix  transformation  based  on  the  wavelet  transform.  At 
each  scale  we  must  have  a  well-defined  orthonormal  transformation  from  our 
approximation  at  that  scale  into  its  scaling  coefficients  and  its  wavelet  coeffi¬ 
cients  at  the  next  coarsest  scale.  To  see  how  this  can  be  done  it  is  sufficient  to 
focus  on  our  previous  example  involving  the  map  from  x{n)  into  {c(n),  d(n)}. 
We  can  write  the  transformation  in  our  example  explicitly  as  follows.  We 
denote  our  4-tap  QMF  filter,  /i,  as  a  row  vector  [  ho  hi  h^  ho  ].  Sim¬ 
ilarly,  our  filter,  g,  is  denoted  as  [  go  gi  g2  go  ]  where  from  (6)  -  (8)  a 

valid  choice  of  g  is 

[50  gi  52  53  ]  =  [  ho  — /i2  hi  —ho  ]  (110) 

If  we  think  of  the  non-zero  portion  of  our  sequence  x{n)  as  a  vector,  x,  in 

7Z^^  and  the  non-zero  portions  of  c(n),d(n)  as  vectors,  c  and  d,  in  7Z^,  our 
maps  x{n)  c{n)  and  x{n)  d{n)  can  be  thought  of  as  the  following  6x10 
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matrices. 


where 


h2  0  0  0 

ho  hi  /i2  ho  0 

0  0  ho  ho 

0  0  0  0  ho 

0  0  0  0  0 

.0  0  0  0  0 

'  92  93  0  0  0 

9o  9i  92  93  0 

0  0  9o  9i  92 

0  0  0  0  50 

0  0  0  0  0 

0  0  0  0  0 


0  0  0  0  0' 

0  0  0  0  0 

ha  0  0  0  0 

hi  ho  ho  0  0 

0  ho  hi  ho  ho 
0  0  0  ho  hi 

0  0  0  0  0  ' 

0  0  0  0  0 

53  0  0  0  0 

9i  92  93  0  0 

0  9o  9i  92  93 

0  0  0  5o  5i  . 


c  =  Hx 
d  =  Gx 


(111) 


(112) 


(113) 

(114) 


Note  that  c  and  d  are  precisely  the  non-zero  portions  of  the  sequences 
one  obtains  by  applying  the  operators  Hi,Gi  to  x{n).  Thus,  we  can  in  fact 
reconstruct  x{n)  from  c,d  using  our  synthesis  equation,  eq.(16).  In  matrix 
notation 

x  =  H^c  +  G^d  (115) 

If  we  denote  our  overall  map  x  c,d  as  the  12  x  10  matrix 


H 

G 


(116) 


then  (115)  says  that  U^U  =  I.  Note,  however,  that  it  is  not  the  case  that 
UU'^  =  /,  since  U  is  not  even  square.  That  is,  while  it  is  true  that  the  finite 
dimensional  version  of  (16),  namely  H  G^G  =  I,  holds,  the  following 
conditions  do  not  hold: 


=  / 

(117) 

GG^ 

=  I 

(118) 

GH^ 

=  0 

(119) 
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The  failure  of  these  conditions  to  hold  is  due  primarily  to  the  first  and  last 
rows  of  H  and  G.  In  Figure  3  these  correspond  to  the  averaging  performed 
at  the  edges  of  both  ends  of  the  lattice.  Note  that  the  rows  of  H  are  mutu¬ 
ally  orthogonal  and  the  rows  of  G  are  mutually  orthogonal.  The  reason  for 
(117,118)  is  simply  the  fact  that  the  edge  rows  of  H  and  G  are  not  normal¬ 
ized  so  that  their  inner  products  equal  one.  The  reason  for  (119)  is  the  fact 
that  the  edge  rows  of  G  are  not  orthogonal  to  the  edge  rows  of  H. 

If  we  want  our  overall  transformation,  U,  to  be  orthonormal,  we  must 
somehow  eliminate  two  of  its  rows.  Note  that  if  we  eliminate  the  first  and 
last  rows  of  the  matrix  H  we  get 


H 


A 


'  ho  hi  h2  hs  0  0  0  0  0  0  ‘ 

0  0  ho  hi  h2  0  0  0  0 

0  0  0  0  ho  hi  /i2  ho  0  0 

.0  0  0  0  0  0  ho  hi  ho  ho 


(120) 


In  this  case  (117)  and  (119)  do  hold  with  H  replacing  H,  but  (118)  does 
not  quite  hold  due  to  the  fact  that  the  the  first  and  last  rows  of  G  are  not 
properly  normalized.  Examining  G  in  detail  and  using  the  QMF  property  in 
(7)  we  see  that 


GG'^  = 


a  0  0  0  0  0  ■ 

0  1  0  0  0  0 

0  0  1  0  0  0 

0  0  0  1  0  0 

0  0  0  0  1  0 

0  0  0  0  0  6 


(121) 


where 


“  =  gl  +  al 

l>  =  9c +9^ 


(122) 

(123) 


Thus,  we  can  satisfy  (118)  simply  by  normalizing  the  first  and  last  rows  of 
G  hy  a  and  6,  respectively. 

The  resulting  transformation  maps  our  length  10  signal  x  into  scaling 
coefficients  c  of  length  4  and  wavelet  coefficients  d  of  length  6.  This  has  the 
following  interpretation.  While  U  maps  the  nonzero  portion  of  x{n)  into  the 
nonzero  portion  of  its  wavelet  coefficients,  d(n),  at  the  next  coarsest  scale. 
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normalizing  the  coefficients  at  the  edges,  it  maps  the  nonzero  portion  of  x(n) 
into  the  nonzero  portion  of  its  scaling  coefficients,  c(n),  while  zeroing  the  two 
scaling  coefficients  at  the  edges.  Note  that  if  we  perform  our  transformation 
recursively  in  scale,  at  scale  each  scale  we  end  up  with  scaling  coefficients 
which  are  zeroed  at  the  edges,  leaving  us  with  fewer  and  fewer  scaling  coef¬ 
ficients  as  we  go  to  coarser  scales.  If  we  take  our  example  one  step  coarser 
in  scale,  i.e.  we  apply  the  same  idea  used  to  create  U  on  the  scaling  coeffi¬ 
cients  c,  we  end  up  mapping  c  into  one  scaling  coefficient  and  three  wavelet 
coefficients  at  the  next  coarsest  scale.  The  overall  two  scale  decomposition 
results  in  scaling  coefficients  defined  on  the  lattice  in  Figure  4.  The  resulting 
wavelet  coefficients  reside  on  the  lattice  in  Figure  5,  where  the  dotted  lines 
represent  averaging  at  the  edges  due  to  the  normalization  of  the  gi's. 

Note  that  if  we  consider  a  QMF  pair  of  length  greater  than  4,  there 
are  more  edge  rows  of  G,  and  the  required  modification  to  these  is  more 
than  simple  normalization.  For  example  if  the  filter  is  of  length  6,  then  the 
corresponding  H  operator,  with  the  edge  rows  removed  has  the  form 


ho 

hi 

^2 

^3 

/l4 

hs 

0 

0 

0 

0 

0 

ho 

hi 

h2 

hz 

hi 

hz 

0 

0 

H  = 

0 

0 

0 

ho 

hi 

hz 

hz 

hi 

hz 

0 

0 

0 

0 

0 

0 

0 

ho 

hi 

hz 

hz 

hi 

hz 

the  corresponding  G  matrix. 

including  the  edge  rows. 

is 

94 

95 

0 

0 

0 

0 

0 

0 

0 

92 

93 

94 

95 

0 

0 

0 

0 

0 

9o 

9i 

92 

93 

94 

95 

0 

0 

0 

0 

0 

9o 

9i 

92 

93 

94 

95 

0 

0 

G  = 

0 

0 

0 

90 

9\ 

92 

93 

94 

95 

0 

0 

0 

0 

0 

0 

0 

9o 

9i 

92 

93 

94 

95 

0 

0 

0 

0 

0 

0 

0 

90 

9i 

92 

93 

0 

0 

0 

0 

0 

0 

0 

0 

0 

9o 

9i 

(124) 


(125) 


The  point  now  is  that  each  of  the  two  pairs  of  edge- rows  in  (125)  is  not 
only  not  normalized  but  also  not  an  orthogonal  pair.  Consequently  we  must 
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Gram-Schmidt  orthonormalize  each  of  these  pairs.  This  changes  the  values 
of  the  nonzero  coefficients  in  the  edge  rows  but  does  not  introduce  additional 
nonzero  entries,  so  that  the  local  nature  of  the  wavelet  calculations  is  still 
preserved.  More  generally,  if  we  have  a  QMF  of  length  P  (which,  to  satisfy 
the  QMF  conditions,  must  be  even),  we  must  perform  two  Gram-Schmidt 
orthonormalizations  of  sets  of  P/2  vectors. 

Note  that  the  coefficients  d{n)  and  c{n)  play  a  symmetric  role  in  our 
procedure,  and  thus  we  could  equally  well  have  zeroed  the  edges  of  of  our 
wavelet  coefficients  d{n)  rather  than  our  scaling  coefficients  c(n)  or  could 
have  zeroed  out  the  scaling  coefficients  at  one  end  of  the  signal  and  the 
wavelet  coefficients  at  the  other.  In  addition  there  are  other  possible  ways 
in  which  to  modify  the  edge  rows  of  H  and  G  to  achieve  orthogonality,  the 
most  common  being  cyclic  wrap-around.  We  refer  the  reader  to  [31]  and  [5] 
for  further  discussion  of  these  variations,  as  we  focus  here  on  the  one  we  have 
just  described,  as  it  is  this  form  that  yields  the  correct  eigendecomposition 
for  a  windowed  version  of  the  state  model  described  in  the  preceding  section. 

In  particular,  we  specify  our  model  on  a  finite  lattice  as  follows  for  m  = 


L,L  +  -  1, 

E[x{L)x{Lf]  =  V.{L)  (126) 

a;(m -f  1)  =  H^A{m  +  l)x{m)  +  B{m  +  l)w{m  +  1)  (127) 

E[w{i)w{j)'^]  =  ,  i  =  L  +  (128) 

where 

A{rn)  =  diag{A{rn)^  ...A{rn))  (129) 

B{m)  =  diag{B{m),  ...B{m))  (130) 

Q{m)  =  diag{Q{m),...Q{m))  (131) 

V.{L)  =  diag{P,{L),...P4L))  (132) 


Here  A{m),  B{m),Q{m),  and  Px{L)  are  as  before,  x{m)  and  w{m)  represent 
the  finite  vectors  of  variables  x(m,n)  and  w{m,  n),  respectively,  at  the  finite 
set  of  nodes  at  the  mth  scale,  and  and  Gm  are  the  counterparts  of  the 
operators  and  Gm  adapted  to  the  case  of  finite  intervals  by  removing  edge 
rows  of  Bm  and  orthonormalizing  those  of  Gm  (note  that  here  we  again  allow 
these  wavelet  operators  to  act  on  vector  signals  component-by-component). 
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Note  that  the  dynamics  (127)  are  not  square,  since  over  a  finite  interval  we 
increase  the  number  of  scaling  coefficients  as  we  move  from  coarse  to  fine 
scales.  For  example  if  scale  L  consists  of  a  single  root  node  and  if  we  use 
QMF’s  of  length  4,  our  dynamics  evolve  on  the  finite  lattice  of  Figure  4  from 
coarse  to  fine  scales,  yielding  a  stochastic  process  at  a  sequence  of  resolutions 
on  a  finite  interval.  As  we  have  indicated,  the  block-eigenstructure  of  our 
finite-lattice  process  is  precisely  as  we  derived  in  the  previous  section,  except 
that  we  now  must  use  the  modified  wavelet  transform  on  a  finite  interval,  as 
determined  by  the  sequence  of  operators  To  make  this  precise,  let 

f{m)  denote  the  number  of  nodes  on  our  finite  lattice  at  scale  m  =  L,...,  M, 
where  for  a  length  P  QMF  we  can  easily  check  that 

f{i  +  1)  =  2/(0  +  P  -  2  (133) 

Define  the  block  unit  vectors 

<51  =  [0d,...,0d,^,0d,...0d]'^  (134) 

ith 

where  the  superscript  j  is  again  used  to  denote  that  the  vector  (now  in 
7^/(i)xrf^  corresponds  to  the  jth.  scale  of  the  lattice  and  where  Id  is  the  d  x  d 
identity  matrix  (and  Od  the  d  x  d  zero  matrix).  The  block  vectors  vfim), 
u^(m)  for  /  =  L,  ...m  —  1  and  for  i  =  0, 1,2.../(P)  —  1  and  n  =  0, 1, 2. ../(/)  —  1 
are  block-eigenvectors  of  the  correlation  matrix  of  the  process  at  scale  m, 
Rxx{m)i  where 


TO— 1 

vfim)  ^  (  n  Hj)Sf 

j=L 

(135) 

m—1 

vLim)  ^  ( n  moisL 

i=l+l 

(136) 

As  we  did  for  the  infinite  case  we  can  now  transform  the  smoothing  problem 
using  a  wavelet  basis  composed  of  the  block  vectors  vf{m)  and  v!^(m).  Our 
transformed  variables  are  formed  as  in  eq.’s(70-81),  except  that  now  we  have 
a  finite  number  of  variables  to  estimate.  In  particular  for  each  scale  index, 
j,  the  translation  index  k  ranges  from  0  to  f{j)  —  1.  The  wavelet  transform 
smoothing  algorithm  developed  in  the  preceding  section  then  applies. 
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5  Numerical  Examples 

In  this  section  we  illustrate  the  use  of  our  multiscale  estimation  framework 
for  solving  estimation  problems  involving  both  single  scale  as  well  as  mul¬ 
tiscale  data.  We  do  this  by  focusing  on  problems  involving  estimation  of 
first-order  Gauss-Markov  processes.  We  have  chosen  this  process  as  it  is  a 
frequently-used  and  well-understood  and  accepted  model  and  it  cannot  be 
exactly  modeled  using  the  multiresolution  framework.  Thus  we  can  demon¬ 
strate  the  richness  of  our  models  in  approximating  well-known  processes  by 
comparing  the  performance  of  our  smoother,  using  model  parameters  chosen 
so  as  to  well-approximate  the  Gauss-Markov  process,  with  the  performance 
of  standard  smoothers.  Our  examples  indicate  that  our  multiscale  models 
do  rather  well  both  in  modeling  important  classes  of  processes  and  as  the 
basis  for  constructing  computationally  efficient  algorithms.  For  first-order 
Gauss-Markov  processes  there,  of  course,  already  exist  efficient  estimation 
algorithms  (Wiener  and  Kalman  filters).  However,  these  algorithms  apply 
only  in  the  case  of  pointwise  measurement  data.  On  the  other  hand,  our 
multiscale  modeling  framework  allows  us  to  incorporate  data  at  a  set  of  res¬ 
olutions  with  no  increase  in  algorithmic  complexity.  We  demonstrate  the 
potential  of  this  capability  by  fusing  multiscale  data  for  the  estimation  of  a 
Gauss-Markov  process,  illustrating  how  the  use  of  coarse-scale  data  can  aid 
in  estimating  features  which  are  not  discernible  using  fine-scale  data  of  poor 
quality.  We  refer  the  reader  to  [5]  for  other  examples  of  the  application  of 
our  framework,  including  the  fusion  of  multiscale  data  for  the  l//-processes 
introduced  in  [26,  27] 

5.1  Processes  and  Multiscale  Models 

Consider  the  following  stationary  Ist-order  Gauss-Markov  process. 

x{t)  =  -I3x{t)  +  w{t)  (137) 

E[x\t)]  =  1  (138) 

This  process  has  the  following  correlation  function  and  associated  power 
spectral  density  function. 

(139) 
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ijj^  +  p 


(140) 


In  the  numerical  examples  that  follow  we  use  a  discretized  version  of 
(137).  In  particular  we  use  a  sampled  version  of  (137)  in  which  the  sampling 
interval  is  small  enough  to  minimize  any  aliasing  effects.  We  choose  /3  =  1 
and  take  the  sampling  rate  to  be  twice  ojq  where  5'a;a;(a;o)  =  .002,  Sxx{^)  being 
the  power  spectral  density  function  of  x{t).  This  yields  a  sampling  interval 
of  A  =  tt/wo  where  a;o  =  30.  Our  discretized  model  is  as  follows. 


a;(t  +  l)  =  ax{t)  +  w{t)  (141) 

E[x^{t)]  =  1  (142) 

a  =  ~  .9006  (143) 

We  consider  the  following  measurements  of  a;(t). 

y{t)  =  x{t)  +  v{t)  (144) 

E[v\t)]  =  R  (145) 

Y  =  {y{t)\t  =  0,...N-l}  (146) 


In  the  examples  that  follow  we  take  the  interval  length  N  =  128. 

Figure  6  is  a  gray-scale  image  of  the  covariance  matrix  of  our  stationary 
first  order  Gauss-Markov  process  defined  on  a  finite  interval,  corresponding 
to  the  model  in  (141).  Note  that  thanks  to  the  normalization  (138),  what 
is  displayed  here  is  the  array  of  correlation  coefficients  of  the  process,  i.e. 
the  covariance  between  two  points  normalized  by  the  product  of  the  stan¬ 
dard  deviation  at  each  point.  The  diagonal  of  the  matrix  thus  is  unity,  and 
the  off-diagonal  terms  decay  exponentially  away  from  the  diagonal.  In  [11] 
this  correlation  coefficient  matrix  is  transformed  using  various  wavelet  bases, 
i.e.  the  matrix  undergoes  a  similarity  transformation  with  respect  to  the 
basis  representing  the  wavelet  transform  based  on  a  variety  of  QMF  filters, 
h{n).  This  transformation  corresponds  essentially  to  the  separable  form  of 
the  2D  wavelet  transform[16].  Figures  7,8  are  the  images  of  the  correlation 
coefficient  matrix  in  Figure  6  transformed  using  QMF  filters  of  length  2  and 
8,  respectively.  That  is,  these  are  the  correlation  coefficient  matrices  for 
the  multiscale  wavelet  coefficients  of  a  finite  length  segment  of  a  Ist-order 
Gauss-Markov  process,  where  the  finest  level  wavelet  coefficients  are  located 
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in  the  bottom  half  of  the  coefficient  vector,  the  next  coarser  level  coefficients 
comprise  the  next  fourth  of  the  vector,  the  next  set  fills  the  next  eighth,  etc. 
Note  that  aside  from  the  finger-like  patterns  in  these  images,  the  off-diagonal 
elements  are  essentially  zeroed.  The  finger  patterns  correspond  to  correla¬ 
tions  between  wavelet  coefficients  at  different  scales  which  share  the  same 
location  in  the  interval.  Note  that  even  these  correlations  are  weak.  Further¬ 
more,  since  the  variances  of  many  of  the  wavelet  coefficients  are  actually  quite 
small,  the  normalization  we  have  introduced  by  displaying  correlation  coef¬ 
ficients  actually  boosts  the  magnitude  of  many  of  the  off-diagonal  terms,  so 
that  the  approximate  whitening  of  this  process  performed  by  wavelet  trans¬ 
forms  is  even  better  than  these  figures  would  suggest.  Note  that  analogous 
observations  have  been  made  for  other  processes,  such  as  fractional  Brown¬ 
ian  motions  [14,  24],  suggesting  a  rather  broad  applicability  of  the  methods 
described  here. 

To  continue,  the  low  level  of  inter-scale  correlation  in  the  wavelet  rep¬ 
resentation  of  the  Gauss-Markov  process  as  illustrated  in  Figures  7  and  8 
motivates  the  approximation  of  the  wavelet  coefficients  of  this  process  as  un¬ 
correlated.  This  results  in  a  lattice  model  precisely  as  defined  in  (17-19).  We 
use  this  model  as  an  approximation  to  the  Gauss-Markov  process  in  order 
to  do  fixed  interval  smoothing.  In  particular,  the  class  of  models  which  we 
consider  as  approximations  to  Gauss-Markov  processes  is  obtained  precisely 
in  the  manner  just  described.  That  is,  we  construct  models  as  in  eq.’s(17- 
19)  where  the  wavelet  coefficients  are  assumed  to  be  mutually  uncorrelated. 
In  this  case  the  variances  of  the  wavelet  coefficients,  w{m)  in  eq.’s(17-19), 
are  determined  by  doing  a  similarity  transform  on  the  covariance  matrix 
of  the  process  under  investigation  using  a  wavelet  transform  based  on  the 
Daubechies  FIR  filters[8].  In  particular  if  Px  denotes  the  true  covariance 
matrix  of  the  process,  V  the  diagonal  matrix  of  wavelet  coefficient  variances, 
and  W  is  the  wavelet  transform  matrix,  then 

A  =  WPxW^  (147) 

V  =  WPapj,rorW^  (148) 

Thus,  this  approximate  model  corresponds  to  assuming  that  A  is  diagonal 
(i.e.  to  neglecting  its  off-diagonal  elements). 

In  our  examples  we  use  the  2-tap  Haar  QMF  filter  as  well  as  the  4- 
tap,  6-tap,  and  8-tap  Daubechies  QMF  filters[8].  Note  that  in  adapting  the 
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wavelet  transform  to  the  finite  interval  we  have,  for  simplicity,  used  cyclic 
wrap-around  in  our  wavelet  transforms  rather  than  the  exact  finite  interval 
wavelet  eigenvectors  described  in  the  preceding  section.  In  this  case  the 
number  of  points  at  each  scale  is  half  the  number  of  points  at  the  next  finer 
scale. 

5.2  Smoothing  Processes  Using  Multiscale  Models 

In  this  section  we  present  examples  in  which  we  compare  the  performance  of 
the  optimal  estimator  for  a  Ist-order  Gauss-Markov  process  with  that  of  the 
suboptimal  estimator  based  on  our  multiscale  approximate  model. 

Let  x{t)^t  =  0,...,N  —  1  denote  a  finite  window  of  our  Gauss-Markov 
process,  and  consider  the  white-noise-corrupted  observations. 


y{t)  =  x{t)  +  v{t)  (149) 

E[v^{t)]  =  R  (150) 

r  =  {y(f)|t  =  0,...iV-l}  (151) 


Let  the  optimal  smoothed  estimate  (implemented  using  the  correct  Gauss- 
Markov  model)  be  denoted  as 


i,(i)  =  B[i(i)|i'l 


(152) 


Letting  x  and  Xs  denote  the  vectors  of  samples  of  x{t)  and  Xs{t),  respec¬ 
tively,  we  can  define  the  optimal  smoothing  error  covariance 


(153) 


Also  if  Px  denotes  the  covariance  of  x  the  optimal  estimate  is  given  by 

=  LxY  (154) 

with 

Lx  =  P.{P.  +  RI)-^  (155) 

and 
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Tiopt  =  Px  —  PxiPx  +  RI)  ^Px  (156) 

More  generally  if  we  consider  any  estimator  of  the  form  of  (154)  (such  as 
the  one  we  will  consider  where  Lx  corresponds  to  the  optimal  smoother  for 
our  multiscale  approximate  model  for  the  Gauss-Markov  process),  then  the 
corresponding  error  covariance  is  given  by 

Ttsub  =  -S[(a:  -  Xsub)(x  -  :Esu6)^]  (157) 

=  {I  -  L,)Px{I  -  +  LxRLl  (158) 

We  now  give  several  examples  demonstrating  the  performance  of  our  mul¬ 
tiscale  models  in  smoothing  Gauss-Markov  processes.  We  focus  in  this  sub¬ 
section  on  the  case  of  a  single  scale  of  data  at  the  finest  scale.  In  Fig.’s  9-13 
we  compare  the  performance  of  the  optimal  estimator,  with  the  performance 
of  our  suboptimal  estimator  based  on  lattice  models  for  both  2-tap  and  8- 
tap  Daubechies  filters.  In  these  examples  the  measurement  noise  variance 
R  —  .5;  i.e.  the  data  is  of  SNR  =  1.4142. 

Note  the  strikingly  similar  performances  of  the  optimal  and  subopti¬ 
mal  smoothers,  as  illustrated  in  Figure  12  for  the  case  of  the  2-tap  lattice 
smoother.  From  visual  inspection  of  the  results  of  the  two  smoothers  it  is  dif¬ 
ficult  to  say  which  does  a  better  job  of  smoothing  the  data;  it  seems  one  could 
make  a  case  equally  in  favor  of  the  standard  smoother  and  the  lattice-model 
smoother.  The  similarity  in  performance  of  the  optimal  smoother  and  our 
lattice  smoothers  is  even  more  dramatic  for  the  case  of  the  8-tap  smoother 
as  illustrated  in  Figure  13. 

Note  that  although  the  standard  smoother  results  in  a  smaller  average 
smoothing  error  (the  trace  of  Siopt  divided  by  the  number  of  points  in  the 
interval),  it  seems  the  average  error  of  our  lattice-model  smoothers  is  not 
that  much  larger.  To  quantify  these  observations  let  us  define  the  variance 
reduction  of  a  smoother  as  follows. 

p  =  variance  reduction  (159) 

_  Po-Ps 
Po 

po  =  average  process  variance  (160) 

Ps  =  average  smoothing  error  variance  (161) 
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2-tap 

4-tap 

6-tap 

8-tap 

SNR  =  2.8284 

1.07  % 

.550  % 

.402  % 

.334  % 

SNR  =  1.412 

3.27  % 

1.77  % 

1.24  % 

1.04  % 

SNR  =  .7071 

6.71  % 

4.13  % 

2.70  % 

2.33  % 

SNR  =  .5 

9.58  % 

6.14  % 

3.87  % 

3.27  % 

Table  1:  Performance  Degradation  Comparison  of  Lattice-Model  Smoothers 
-  2-tap,  4-tap,  6-tap  and  8-tap 

We  also  define  the  performance  degradation  resulting  from  using  a  lattice 
smoother  as  compared  with  using  the  standard  smoother  as  follows. 

Aperf  =  performance  degradation  (162) 

—  f^standard  ~  f^lattice 
^standard 

/^standard  “  variance  reduction  of  standard  smoother  (163) 

^lattice  ~  variance  reduction  of  lattice-model  smoother  (164) 

Table  1  shows  the  performance  degradation  of  the  lattice-model  smoother 
relative  to  the  standard  smoother  for  filter  tap  orders  2,  4,  6,  and  8  and  for 
four  different  noise  scenarios:  1)  SNR  =  2.8284  2)  SNR  =  1.412  3)  SNR  = 
.7071  4)  SNR  =  .5.  The  variance  reductions  are  computed  using  smoothing 
errors  averaged  over  the  entire  interval.  While  the  degradation  in  perfor¬ 
mance  lessens  as  the  order  of  the  filter  increases,  a  great  deal  of  the  vari¬ 
ance  reduction  occurs  just  using  a  2-tap  filter.  For  example  for  the  case  of 
SNR  =  1.412  the  standard  smoother  yields  a  variance  reduction  of  85  per¬ 
cent.  It  is  arguable  whether  there  is  much  to  be  gained  in  using  an  8-tap  filter 
when  its  relative  decrease  in  performance  degradation  is  only  2.23  percent 
over  the  2-tap  smoother;  i.e.  the  variance  reduction  of  the  8-tap  smoother 
is  83.8  percent  while  the  variance  reduction  of  the  2-tap  smoother  is  already 
81.9  percent. 

The  performance  degradation  numbers  for  the  lower  SNR  case  (SNR  = 
.7071)  seem  to  suggest  that  the  effect  of  raising  the  noise  is  to  decrease  the 
performance  of  the  lattice-model  smoothers.  But  one  should  keep  in  mind 
that  this  decrease  is  at  most  only  marginal.  Consider  the  case  where  the 
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SNR  =  .5.  In  this  case  the  data  is  extremely  noisy,  the  noise  power  is  double 
that  of  the  case  where  SNR  =  .7071,  and  yet  the  performance  degradation 
in  using  the  2-tap  smoother  compared  with  the  standard  smoother  is  9.58 
percent,  up  only  2.87  percent  from  the  case  of  SNR  =  .7071.  Furthermore, 
if  one  examines  a  plot  (Fig  14)  of  the  results  of  applying  the  two  smoothers 
on  such  extremely  noisy  data,  the  performance  of  the  two  smoothers  is  as 
before  quite  comparable. 

We  emphasize  that  the  average  performance  degradation  is  a  scalar  quan¬ 
tity,  and  at  best  gives  only  a  rough  measure  of  estimation  performance.  From 
this  quantity  it  is  difficult  to  get  any  idea  of  the  qualitative  features  of  the  es¬ 
timate.  The  plots  of  the  sample  path  and  its  various  smoothed  estimates  over 
the  entire  interval  offer  the  reader  much  richer  evidence  to  judge  for  himself 
what  the  relative  differences  are  in  the  outputs  of  the  various  smoothers. 

The  preceding  analysis  indicates  that  multiscale  models  can  well-approximate 
the  statistical  characteristics  of  Ist-order  Gauss-Markov  processes  in  that 
nearly  equivalent  smoothing  performance  can  be  obtained  with  such  models. 
Further  corroboration  of  this  can  be  found  in  [5]  where  Bhattacharya  dis¬ 
tance  is  used  to  bound  the  probability  of  error  in  deciding,  based  on  noisy 
observations  as  in  (149),  if  a  given  stochastic  process  x(t)  is  either  a  Ist-order 
Gauss-Markov  process  or  the  corresponding  multiscale  process  obtained  by 
ignoring  interscale  correlations.  An  important  point  here  is  that  the  Ist-order 
Gauss-Markov  model  is  itself  an  idealization,  and  we  would  argue  that  our 
multiscale  models  are  an  equally  good  idealization.  Indeed  if  one  takes  as  an 
informal  definition  of  a  “useful”  model  class  that  (a)  it  should  be  rich  enough 
to  capture,  with  reasonable  accuracy,  important  classes  of  physically  mean¬ 
ingful  stochastic  processes;  and  (b)  it  should  be  amenable  to  detailed  analysis 
and  lead  to  efficient  and  effective  algorithms,  then  we  would  argue  that  our 
multiscale  models  appear  to  have  some  decided  advantages  as  compared  to 
standard  models.  In  particular  not  only  do  we  obtain  efficient,  highly  paral¬ 
lel  algorithms  for  the  smoothing  problems  considered  in  this  section  but  we 
also  obtain-equally  efficient  algorithms  for  problems  such  as  multiscale  data 
fusion,  which  we  discuss  next. 

5.3  Sensor  Fusion 

In  this  section  we  provide  examples  that  show  how  easily  and  effectively  our 
framework  handles  the  problem  of  fusing  multiscale  data  to  form  optimal 
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smoothed  estimates.  In  our  framework  not  only  is  there  no  added  algorithmic 
complexity  to  the  addition  of  multiscale  measurements  but  it  is  also  easy  for 
us  to  evaluate  the  performance  of  our  smoothers  in  using  multiscale  data. 

For  simplicity  we  focus  here  on  the  problem  of  fusing  data  at  two  scales. 
Consider  the  Gauss-Markov  process  used  in  our  previous  examples  as  defined 
in  (141).  We  assume  that  we  have  fine-scale,  noisy  measurements  as  in  (149) 
together  with  one  coarser  level  of  measurements.  In  particular,  as  before, 
the  length  of  our  interval  is  taken  to  be  128  points.  Thus  we  assume  that  we 
have  2^  =  128  measurements  of  the  finest  scale  version  of  our  signal  as  well 
as  2^  measurements  of  the  coarser  approximation  of  our  signal  at  scale  K.  ^ 

Consider  the  case  where  our  fine  scale  measurements  are  of  extremely 
poor  quality.  In  particular  we  take  the  case  where  our  data  is  of  SNR  = 
.3536  (the  noise  power  is  eight  times  the  signal  power).  Figure  15  compares 
the  result  of  using  the  standard  smoother  on  these  data  with  the  result  of 
using  a  4-tap  lattice  model  smoother  on  the  same  data.  The  performance  of 
the  two  smoothers  is  comparable  as  we  would  expect  from  our  results  in  the 
previous  section. 

Now  we  use  the  same  data  and  consider  fusing  a  higher  quality  coarse 
scale  data  set  to  form  a  smoothed  estimate.  In  particular  we  take  our  coarse 
data  to  reside  at  the  scale  one  level  coarser  than  the  original  data  (scale  at 
which  there  are  64  points)  and  the  coarsening  operator.  Hi,  corresponds  to  a 
4-tap  filter.  The  SNR  of  this  coarse  data  is  equal  to  2.  Figure  16  compares 
the  result  of  using  the  standard  smoother  on  the  low  quality  fine  scale  data 
alone  with  the  result  of  using  our  4-tap  lattice  smoother  to  fuse  this  low 
quality  data  with  high  quality  coarse  data. 

Note  that  the  coarse  measurement  aids  dramatically  in  improving  the 
quality  of  the  estimate  over  the  use  of  just  fine-scale  data  alone.  To  quantify 
this  recall  that  our  smoother  computes  the  smoothing  error  at  each  scale. 
We  use  these  errors  as  approximations  to  the  actual  suboptimal  errors  (note 
that  the  computation  of  the  actual  error  covariance  from  multiscale  data  is 
appreciably  more  complex  than  for  the  case  of  single  scale  measurements; 
the  same  is  not  true  for  our  tree  models,  where  the  complexity  of  the  two 
cases  is  essentially  the  same).  The  variance  reduction  in  the  case  of  fusing 

^Note  that  as  mentioned  previously,  the  lattice  models  used  in  this  section  correspond 
exactly  to  the  wavelet  transform,  i.e.  to  (17)  -  (19),  so  that  the  signal  x(K)  is  precisely 
the  vector  of  scaling  coefficients  at  scale  K  of  the  fine  scale  signal  x{M). 
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the  two  measurement  sets  is  97  percent  versus  only  36  percent  for  the  case 
of  using  only  the  poor  quality  fine-scale  data. 

To  explore  even  further  the  idea  of  fusing  coarse  measurements  with  poor 
quality  fine  measurements  we  compare  the  results  of  using  coarse  measure¬ 
ments  of  various  degrees  of  coarseness  in  order  to  determine  how  the  scale  of 
the  coarse  data  affects  the  resolution  of  the  smoothed  estimate.  In  particular, 
we  take  our  fine  scale  data  to  be  the  same  as  that  in  Figure  16.  However,  we 
supplement  this  data  with  coarse  measurements  of  extremely  high  quality 
(SNR  =  31.6)  and  consider  several  cases:  1)  the  coarse  data  is  at  a  scale 
at  which  there  are  64  points  2)  the  coarse  data  is  at  a  scale  at  which  there 
are  32  points  3)  the  coarse  data  is  at  a  scale  at  which  there  are  16  points. 
Figures  17-19  compare  the  original  signal  with  its  smoothed  estimates  using 
coarse  data  at  the  three  different  scales.  Note  how  the  estimates  in  these 
figures  adapt  automatically  to  the  quality  and  resolution  of  the  data  used  to 
produce  them. 

6  Conclusions 

In  this  paper  we  have  described  a  class  of  multiscale,  stochastic  models  mo¬ 
tivated  by  the  scale-to-scale  recursive  structure  of  the  wavelet  transform.  As 
we  have  described,  the  eigenstructure  of  these  models  is  such  that  the  wavelet 
transform  can  be  used  to  convert  the  dynamics  to  a  set  of  simple,  decoupled 
dynamic  models  in  which  scale  plays  the  role  of  the  time-like  variable.  This 
structure  then  led  us  directly  to  extremely  efficient,  scale- recursive  algorithms 
for  optimal  estimation  based  on  noisy  data.  A  most  significant  aspect  of  this 
approach  is  that  it  directly  applies  in  cases  in  which  data  of  differing  resolu¬ 
tions  are  to  be  fused,  yielding  computationally  efficient  solutions  to  new  and 
important  classes  of  data  fusion  problems. 

In  addition  we  have  shown  that  this  modeling  framework  can  produce  ef¬ 
fective  models  for  important  classes  of  processes  not  captured  exactly  by  the 
framework.  In  particular  we  have  illustrated  the  potential  of  our  approach 
by  constructing  and  analyzing  the  performance  of  multiscale  estimation  al¬ 
gorithms  for  Gauss-Mar kov  processes.  Furthermore  we  have  shown  how  the 
problem  of  windowing  —  i.e.  the  availability  of  only  a  finite  window  of  data 
-  can  be  dealt  with  by  a  slight  modification  of  the  wavelet  transform.  Fi¬ 
nally,  while  what  we  have  presented  here  certainly  holds  considerable  promise 
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for  1-D  signal  processing  problems,  the  payoffs  for  multidimensional  signals 
should  be  even  greater.  In  particular  the  identification  of  scale  as  a  time¬ 
like  variable  holds  in  several  dimensions  as  well,  so  that  our  scale-recursive 
algorithms  provide  potentially  substantial  computational  savings  in  contexts 
in  which  the  natural  multidimensional  index  variable  (e.g.  space)  does  not 
admit  natural  “directions”  for  recursion. 
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Figure  2:  Parallel  ID  Smoothing  -  Down-up 
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Figure  3:  Transformation  of  a  10-pt.  Sequence  x{n)  into  its  6-pt.  Scaling 
Coefficients  c(n)  and  its  6-pt.  Wavelet  Coefficients  d{n) 
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Figure  4:  Lattice  Representing  Domain  of  Scaling  Coefficients  for  2-scale 
Decomposition  Based  on  Zeroing  Edge  Scaling  Coefficients 
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Figure  5:  Lattice  Representing  Domain  of  the  Wavelet  Coefficients  for  2-scale 
Decomposition  Based  on  Zeroing  Edge  Scaling  Coefficients 
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Figure  6:  Covariance  Matrix  of  a  Stationary  Gauss-Markov  Process 


42 


Figure  9:  Sample  Path  of  a  Stationary  Gauss-Markov  Process  (solid)  and  Its 
Noisy  Version  with  SNR=  1.4142  (dashed) 
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Figure  10:  Stationary  Gauss-Markov  Process  (solid)  and  Its  Smoothed  Ver¬ 
sion  (dashed)  Using  Standard  Minimum  Mean-Square  Error  Smoother  (Data 
of  SNR=1.4142) 
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Figure  11:  Stationary  Gauss-Markov  Process  (solid)  versus  Multiscale 
Smoother  Using  2-Tap  (dashed)  (Data  of  SNR=1.4142) 
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Figure  12:  Standard  Minimum  Mean-Square  Error  Smoother  (solid)  versus 
Multiscale  Smoother  Using  2-Tap  (dashed)  (Data  of  SNR=1.4142) 
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Figure  13:  Standard  Minimum  Mean-Square  Error  Smoother  (solid)  versus 
Multiscale  Smoother  Using  8-Tap  (dashed)  (Data  of  SNR=1.4142) 
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Figure  14:  Sample  Path  of  a  Stationary  Gauss-Markov  Process  (solid),  Stan¬ 
dard  Smoother  (dotted),  2-Tap  Smoother  (dashed)  (Data  of  SNR=.5) 
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Figure  15:  Sample  Path  of  Stationary  Gauss-Markov  Process  (solid),  Result 
of  Standard  Smoother  on  Poor  Data  of  SNR=.3536  (dotted),  Result  of  4-tap 
Lattice  Smoother  on  Same  Data  (dashed) 
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Figure  16:  Sample  Path  of  Stationary  Gauss-Markov  Process  (solid),  Result 
of  Standard  Smoother  on  Fine  Data  of  SNR  =  .3536  (dotted),  Result  of  4-tap 
Lattice  Smoother  on  Same  Data  Supplemented  with  Coarse  Data  of  SNR  = 
2  (dashed) 
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Figure  17:  Sample  Path  of  Stationary  Gauss-Markov  Process  (solid),  Results 
of  4-tap  Lattice  Smoother  Using  Fine  Data  of  SNR  =  .3536  Supplemented 
with  Coarse  Data  of  SNR  =  31.6:  Coarse  Data  at  64  pt.  Scale  (dashed) 
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Figure  18;  Sample  Path  of  Stationary  Gauss-Markov  Process  (solid),  Results 
of  4-tap  Lattice  Smoother  Using  Fine  Data  of  SNR  =  .3536  Supplemented 
with  Coarse  Data  of  SNR  =  31.6:  Coarse  Data  at  32  pt.  Scale  (dashed) 
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Figure  19:  Sample  Path  of  Stationary  Gauss-Markov  Process  (solid),  Results 
of  4-tap  Lattice  Smoother  Using  Fine  Data  of  SNR  =  .3536  Supplemented 
with  Coarse  Data  of  SNR  =  31.6:  Coarse  Data  at  16  pt.  Scale  (dashed) 
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